My recent posts on denormalization [1], identical relations [2] and the POOD [3,4,5] based on D. McGoveran (DMG) interpretation of Codd's RDM, triggered online reactions [6,7] and some comments in place that reflect the current understanding of the RDM. One of my readers referred me back to a 2004 exchange triggered by an exchange @old dbdebunk.com with both CJD and DMG on view updating--an important aspect discussed in my posts--on which the two perspectives differ. Last week I asked what's wrong with CJD's position in the exchange. Here is the original exchange, albeit in abbreviated form (I made minor revisions for clarity and added references to more recent sources the reader may want to consult, such as the 2013 CJD book on the subject [8], which also purports to describe some of DMG's more recent thinking), in which DMG counters with his position and adds a new comment. Throughout, I've changed "relvar predicate" (still used by CJD) to "relation predicate", as preferred in the DMG interpretation.
"I just (almost) finished reading AN INTRODUCTION TO DATABASE SYSTEMS, 8th Ed. and I have a bit of a problem with view updatability. I'll try and explain with an example and I would be pleased if I could get some clarifications.
Let R1 be a base relation (NAME,QUALITY), with the semantics that every occurring tuple indicates that the person with the name in question has the quality in question. That is, the entity rule in the corresponding relation predicate (RP) is something like "Person identified by name (NAME) has quality (QUALITY)."
Let R2 be a similar base relation (NAME, FUNCTION), with the corresponding rule "Person identified by name (NAME) has function (FUNCTION)."Tuples in R1 could indicate that "Chris Date has smartness.", or "George Bush has linguistic difficulties." Tuples in R2 could indicate that "Chris Date has the function database lecturer", or "George Bush has the function of President of the United States.".
These two relations can be joined, yielding a view whose rule would be something along the lines of "Person identified by name (NAME) has quality (QUALITY) and has function (FUNCTION)." One tuple's proposition in this view might be that "Chris Date has smartness and the function of database lecturer."
Now suppose that Chris Date is tired of being misunderstood, and decides to retire. Implying that Chris Date no longer has a function whatsoever. A user might now observe that the proposition contained in his database (view), is no longer true. He might want to correct this and (if the update is done through this join view) he can obviously only do so by deleting the join tuple that is now false. Now, if I understand your proposed join-delete algorithm correctly, then the system should react by removing both the assertion regarding Chris Date's function *as well as* the assertion regarding Chris Date's quality? Frankly, I can hardly believe this is for real ... Stop giving lectures does not imply stop being smart and vice-versa.
More generally: deleting a tuple from a join (or intersect or difference for that matter) view, means in fact that the user is saying that "it is not true that both P(A) and P(B) are true."--where P(A) and P(B) are both propositions related to the tuples of the base relations that are joined in the view. Nothing more and nothing less. To deduce from this fact that "both P(A) and P(B) are false" (which is indeed what we mean when we delete "the A part from A and the B part from B"), simply looks like "false reasoning" to me (well, at least in my school days it was labeled that way). I therefore have come to believe that "view updatability" should be allowed to happen IF AND ONLY IF the system has PRECISELY ONE WAY to "resolve" the user's request. You state that this is "highly desirable", but I believe that's not good enough. I believe it should be a sine qua non."
C. J. Date
First of all, let me say that I no longer regard view updating as a fully solved problem. A year ago or so I thought it was--but then Hugh Darwen started to ask some hard questions and I realized I was wrong. (David McGoveran will probably disagree with me here.) That said, I remain optimistic that the problem is solvable.
That said, I remain optimistic that the problem is solvable. The discussion in my 8th Ed. is generally along the right lines, though it gets some of the details wrong. One important piece of the puzzle that wasn't mentioned in the 8th edition is what I and Darwen introduced as The Assignment Principle (Ed. Note: For CJD's current position see [8]).
Now let me address some of the specific points you raise. The first has to do with updating joins. For reasons that should be obvious I don't want to use your specific example here; let me switch to the well-known suppliers-and-parts example. Let SSP be the join of relvars S and SP over S# (a one-to-many join; your example was many-to-many, and I'll get to that case in a moment). Consider an attempt to delete just the tuple for supplier S1 and part P1 from SSP. The rules say: Delete the tuple for S1 from relvar S and delete the tuple for S1 and P1 from relvar SP. So what happens? There are three possibilities:
- If there are no other shipments for supplier S1, the overall operation succeeds.
- If there are some other shipments for supplier S1 and no other updates are done (i.e., there are no side effects), the overall operation fails on a referential integrity violation.
- If there are some other shipments for supplier S1 and deleting supplier S1 from relvar S "cascades" to delete those shipments from relvar SP, the overall operation fails on a violation of The Assignment Principle.
DELETE ssp WHERE s# = s#('s1') AND p# = p#('p1');is inherently unsafe, since we presumably don't know, in general, whether there are any other shipments for supplier S1. The following operation, by contrast, is safe:
DELETE ssp WHERE s# = s#('s1');(it will delete supplier S1 from S and all shipments for supplier S1 from SP).
Now, you say, correctly (though I paraphrase), that to infer (NOT PA) AND (NOT PB) from NOT (PA AND PB) is "false reasoning." I agree. But I never claimed--at least, I don't think I ever claimed!--that the rule for deleting from a join was based purely on logical inference. However, I do claim that (a) it's a simple rule, (b) it's a reasonable rule, (c) it gives useful and expected results in the majority of cases, and (d) it gives predictable results in all cases. I also claim that the rule is both (e) symmetric in itself and (f) symmetric with respect to the corresponding insert rule. Finally, I claim (g) it's desirable and indeed advantageous to have a rule that doesn't depend on whether the join is one-to-one, one-to-many, or many-to-many. (Which takes care of the many-to-many case as promised.)
David McGoveran
Without writing a paper, I do want to make a few comments in response to the objections to view updatability, at least insofar as it addresses the approach proposed by Chris Date and myself.
Let me note up front that Chris and I may not see eye-to-eye regarding view updatability if his published expositions are indicative. I suspect this divergence is largely a matter of some details of the approach which Chris more or less downplays, and which then permit many confusions (and objections) to arise whether by Hugh Darwen, the recent correspondent, or others. Unfortunately I have not been in a position to pursue clarification with Chris, so please do not take this any kind of indictment.
View updatability is of crucial importance to the relational model. If there is no solution, then there is likewise no possibility of data independence (both physical and logical!). That would be tantamount to disclaiming the relational model itself, insofar as data independence was the stated initial motivation of Dr. Codd's relational model research program.
I believe that a solution to the view updatability "problem" is not only possible, but that it is a solved problem. I do not believe that the problem has been formulated well in the literature, nor do I believe that the solution has been well and completely explained. (I'm committed to getting this done before my "function" is defunct, hence much focused work on the forthcoming book. Furthermore, I believe that the consequences of understanding and applying the solution are wide-reaching, so much so that we might have to acknowledge that we never truly understood the relational model heretofore.
A complete explanation of the solution will be offered in my forthcoming book [9]. Here I will have to be satisfied with providing a few principles, recommended conceptualizations, or guideposts. Throughout, I've placed quotes around certain words or phrases in lower case, because they could use some further explanation, might be over-interpreted, or otherwise are imprecise. I apologize in advance for not being able to do better by this important subject here. A detour at this time into a partial explanation of view updating would mean a significant delay in writing the more complete explanation in the book, a delay I am not willing to entertain at this juncture.
- The "Date-McGoveran view update approach" (or, in more concrete form, the algorithm) provides a set of rules for applying updates to any relation.
- These rules are not to be applied in isolation. They are, in a sense, context dependent, where the context is supplied by (1) constraints accessible to the DBMS and (2) the current user's intent as expressible through (a certain strictly constructive and strictly finite) first order predicate logic. The simple expression of the rules only tell us what to do in the absence of further contextual information.
- A DBMS that supports the relational model cannot be "magic". It cannot compensate for ambiguity, intuit withheld information or assumptions, or correct expressions that violate the designer's or user's intent. If the designer fails to capture information in database design, if information is hidden from the user, or if the users incompetently fail to express their intent [Ed. Note: Or the data language is not sufficiently expressive to allow it], it will certainly produce seemingly anomalous or "surprising" results when faced with these problems [Ed. Note: Semantic correctness is not guaranteed].
- No logical independence mechanism can be permitted to hide relation semantics from users (the contrary would be absurd), although it seems that many writers have assumed that to be part of its intent. The derived semantics of a view is inherently distinct from that of a base relation. For example, the relation predicate for a join view necessarily involves multiple irreducible predicates, whereas that for a base relation should not [Ed. Note: Which is another way of saying the design of base relations must adhere to the Principle of Ortogonal Design).
- Logical independence can hide base data and base data representation. For example, a projection view can hide the specific data found in the excluded attributes even though the predicate clearly tells us that those attributes, and data in them, must exist. Additionally, the view might not then contain any evidence of dependencies among those attributes and attributes in other relations.
- For logical independence to be possible, the semantics of relational operators must be consistent, whether the operands are derived or base relations. Although we have traditionally thought of this as requiring that we find some algorithm for view updating, this is a poor method of attack, which seems obvious when we consider that derived relations necessarily have the more complex semantics. Thus, we must solve the general case of update semantics for which updating base relations is the special case (and not the reverse).
- All relational updates should be understood as set transformations: Therefore, they are atomic, and the final "state" is not dependent on the order of any intermediate steps. Just as is the case when evaluating a logical expression (e.g., a tautology, or a relation predicate), there are no implementation dependencies (as, for example, in treating the left vs. right "MINUS" first)--only implementation errors. An optimizer incorporating such implementation dependencies is faulty.
- We must avoid the temptation to think of a relational update as anything other than a declarative specification of a resulting relation (a set) incorporating the relation predicate of source relation and its current value. Nothing else is pertinent to computing the value that is to be "assigned" to the relation. If a reference to one or more tuples occurs within the update expression, we should not think even of these as anything other than a particular way of expressing a constraint (i.e., as part of an "update predicate").
- Correct database design is crucial for understanding view updatability (Chris Date will possibly demur on this point) and goes well beyond full normalization or even our POOD. [Ed. Note: There are three design principles, see [2]]. True, the results will always be predictable and deterministic, but are likely to produce confusing side effects from time-to-time.
I disagree with Chris' apparent conclusion that view updating need not be based solely on logical inference; I insist that it can and should be. Many of the problems that have been raised regarding our proposal have to do with how predicates are to be interpreted, including the predicates that the user conveys to the DBMS in terms of insert, update, and delete. Put another way, our "updating language" is not sufficiently expressive to convey the user's intent accurately. As discussed in[8] (although the reader is cautioned that I did not write or edit the proposal that is attributed to me and so it is not completely accurate), this deficiency can and should be remedied.
With these principles in mind, reconsider the "problem" with the example of delete through a join view. The "betrayal" of the user is an illusion and ceases to be a problem. In fact, it is a problem which we accept all the time for base relations: We can easily construct a database of base relations for which there are "side effects" (remember cascade delete?), or which prevent the user from obtaining update results that behave as if the target relation exists in isolation.
The "problem" of finding a unique inverse transformation for a join view in a given context without specifying any context goes away. Clearly, the delete must satisfy 'NOT PA OR NOT PB'. (Aside: We must be very careful what we mean by NOT, avoiding confusion with simple complement. Relations have a relative complement--tuples not asserted to be 'true'--with a scope that is distinct from that of the relation predicate--tuples that cannot belong to the relation [Ed. Note: Representing false instantiations of the predicate]). This requirement may be further constrained in at least one of two ways.
- If they know, users should specify their intent as to why the conjunction is 'false'.
- Some of the integrity constraints and dependencies the designer imposes may further constrain the general requirement.
References
[1] Pascal, F., Denormalization for Performance: Don't Blame the Relational Model
[2] Pascal, F., Relation Predicates and Identical Relations
[3] Pascal, F., The Principle of Orthogonal Database Design Part I
[4] Pascal, F., Principle of Orthogonal Database Design Part II
[5] Pascal, F., The Principle of Orthogonal Database Design Part III
[6] Comments on "Database Design Relation Predicates and Identical Relations"
[7] Comments on "Denormalization for Performance: Don't Blame the Relational Model"
[8] Date, C. J., VIEW UPDATING AND RELATIONAL THEORY (O'Reilly Media, 2013)
[9] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLKS, forthcoming.
(Based on the join examples in his 2012 updating book, I think CJ Date's comments would be different today, even if that book persists with the 'reasonable' motive, which is unnecessary as D. Mcgoveran points out when he says that only logical inference is necessary.)
ReplyDeleteThe OP's comments reflect a procedural interpretation of relational algebra that most or all dbms implementers are prey to, for example saying that join view deletions are deletes 'through' views but base table deletes are deletes 'from' base tables, which is a foible that has file system heritage. They approach both examples as if the problem is to decide what base table updates must have produced the desired view's value, whereas the question is what base values are implied by the desired view value. This is the result of interchanging a relation having a predicate that is not the predicate of the join's projections on each of the join operands' headings, for example pretending that PA is the predicate of ( A JOIN B ) { set of A's attributes}. Instead it is only a subset of A that has the predicate of such a projection on the join and the only question is what projections of the join result from the absence of the 'deleted' row. It's not the values of the A and B tables that need to be inferred, but the values of the respective subsets of those tables that are the effective operands of the join. For any particular join those subsets have only one solution. The subsets that DON'T affect the join's value (subsets that are NOT in the respective projections) are necessarily unchanged by a join deletion.
Of course (NOT PA) OR (NOT PB) follows from the solution but it's a mistake to think of the expression as a recipe rather than one side of a (declarative) tautology. People got away with the mistake in the file system era but when views are allowed the more exact expression is (NOT PAJ) OR (NOT PBJ) where PAJ and PBJ project the join on A and B attributes respectively.
For the two examples in this exchange a Venn diagram makes it graphic that ( PA AND NOT PB ) AND ( PB AND NOT PA) must be empty and that each of these two conjuncts is disjoint compared to the predicate of the mixed set of tuples in the conjunction given by the join's predicate, PA AND PB. Join deletion doesn't require either logical conjunct to be replaced but arguments are often made to the effect that they could be replaced. These arguments use such an amorphous notion of deletion by implementers that a DELETE command sometimes means INSERT! D. McGoveran's updating chapter gives precise definitions that prevent this mistake.
Another mistake is thinking that correct implementation of join deletions implies that at least one tuple from each operand must always be deleted or that this implies that all insertions to unions have only one solution.
DBMS implementers need to understand that support of such subsets is equivalent to support of relations and vice-versa. Neither results from support of only base relations aka base tables/relvars. Furthermore, in current systems the effect is that only base relations have predicates!
Join deletions are so important not just because they have only one solution if implemented correctly, ie., implemented on a purely logical basis aka declarative basis but because their support would change drastically the data design landscape and thereby go a long way toward the ultimate benefit of data independence, preserving the value of a data investment solely by means of logical structure. Of course this is contrary to many vested interests such as those of the big commercial products and the career investments of multitudes of developers.
David McGoveran replies:
DeleteI agree and fall prey to this procedural terminology myself, which is deeply ingrained in both relational and SQL “culture”. (As I have pointed out, the term “base table” referred to a set of tables that satisfied certain very important logical requirements, and only came to be equated to physical storage because it was natural to at least store the data represented in base tables, if not in certain derived tables as well.) It is, in fact, why I chose to characterize the algorithm involved as a universal algorithm for updating relations (see my patent as well as my “Updating Relations” article), whereas the 1994 articles focused on “view updating” – and CJD still thinks this way if the 2013 book is any indication.
Thank you for grasping that about the two examples! The apparent ambiguity as to which underlying relations to apply the delete to is due not to an ambiguity in the algorithm, but to the user incompletely conveying their intent and knowledge. A hint as to how to fix this language problem is given in an appendix of CJD’s 2013 book on Updating Views.
Throughout, I've changed "relvar predicate" (still used by CJD) to "relation predicate", as preferred in the DMG interpretation.
ReplyDeleteDate and McGoveran's first joint treatment of view updating [1994, published on dbdebunk at the time] was indeed confused in its usage of "predicate". I suspect this can be attributed to the authors failing to understand each other. Both authors have since disavowed that work; and quite understandably: it is dreadfully confused.
CJD has since clarified, and uses "relvar predicate" to mean the 'external predicate' of a relvar.
DMG [and Pascal usually follows] uses 'relation predicate' to mean the 'internal predicate'/constraint on allowable content -- at least that's my best understanding.
So for [Ed] to gloss DMG's "tuples that cannot belong to the relation" as "Representing false instantiations of the predicate" is reintroducing the confusion so evident in the 1994 material.
I think that by "tuples that cannot belong to the relation", DMG means tuples (taken as a set presented for update) for which the relation constraints do not hold; not tuples whose corresponding propositions do not hold in the world.
"for example saying that join view deletions are deletes 'through' views but base table deletes are deletes 'from' base tables, which is a foible that has file system heritage"
ReplyDeleteThat's funny. McG used the word "through" himself in the "David adds" section.
What's funny is that out of the long comment that's the only detail that caught your eye.
ReplyDeleteYour conclusion that it was the only one is unwarranted. But the others are so age-old and I've seen them so many times already that I've stopped bothering. Neither arguing nor asking for clarification will change a thing.
DeleteThe fact is that it was the only one you commented on.
ReplyDeleteAge is not a serious argument -- validity is. The problem is that you see everything through the CJD interpretation and, unfortunately, it is not properly, if at all, grounded in FOFL and set theory and one must be grounded in them to see the problems. That's what EFC's RDM was about and it is the ignorance of the dual theoretical foundation that is responsible for all the misconceptions and misunderstanding of the RDM.
It may have been the only one I commented on *HERE*, it certainly wasn't the only one I've ever commented on whenever/wherever this singer and tune popped up. Which is why I can confidently state that neither arguing nor asking for clarification will change a thing.
DeleteI understand.
DeleteBut my point was that your perception is colored by your RDM interpretation, rather than the formal foundation of the RDM. And in that context your comment was rather flippant, as I'm sure you realize.
Missed points make good fertilizer for rhetorical objections which are often the only way to defend a Not-Invented-Here position. It's one of several obfuscation techniques. The point wasn't that one word is more right but that only one should apply to relation updates. Pick the one you like.
DeleteFans of base relvars as well as any other language device that can be equated to physical structures or SQL implementers or file system coders usually are not able to understand data independence at all. Experience suggests that years of unlearning may be needed, the more coding experience the more years, which in turn might mean that only generational change will make any difference. McGoveran's 2004 statement that the RM is not well understood appears now to be more true than ever thanks to the accumulated obfuscations.
ReplyDeleteCodd would have had no difficulty accepting the specific deletion CJ Date describes above (see his book). While my own argument focuses only on join deletions rather than overall database consistency and is also less theoretical and more empirical than David McGoveran's, I've yet to see that it counters anything he's written and most of it was prompted by his explanations.
When I first used SQL-type products (in very basic ways even though they included every popular mainframe and Unix version) I had to wonder why a relational dbms didn't allow all user commands to address relations, instead these dbms'es addressed certain tables which were distinguished from other tables called views and none of the views could be explictly replaced by a user. Surely the material of the user interface must be relations, right? From what I'd seen of Codd's first and second papers, I couldn't understand why a dbms with the word 'relational' in its moniker didn't allow me to replace relations. After all, even a read-only database needs to replace empty declared relations before there is anything to query. As I saw more and more customer SQL databases I started to realize that all these products expected the user to program the database, the dbms didn't do that. So to replace some relations, for each one you had to find out what relations they were derived from and achieve the replacement you wanted via side-effects. There is now a whole sub-industry cluttered with experts to help users do that. Not only are applications more complicated than they need to be but there is no certain way to guarantee that every piece of application updating code is not making the intended data design inconsistent.
At present the so-called db schema and user-supplied data do not amount to the program they could amount to. Instead multiple app developers fill in the db program's missing code, repetitively and instead of executing the incomplete program much of what the dbms does is no more than regurgitating its input, the partial program as it were. .
... (continuation of earlier comment)
ReplyDeleteThe problem with base relvars as a user device for join deletions is that the logical set of tuples addressed by the name A is not the set addressed by the name "( A JOIN B ) { A attribute set })" even though they have the same heading. These two sets are disjoint and Codd's difference operator can calculate the value of only one of them (he called this relational closure). So if a relvar is used as a target device for replacing a relation either it must have a set membership function and relational expression and predicate that correspond to exactly one of the subsets or the dbms must be capable of addressing a subset of the relvar's tuples for which those three kinds of logical addresses agree. It might be that a certain implementation language allows users to express set differences such as "( A JOIN B) minus D" that don't preserve (A JOIN B) tuples that are not in D. As far as I know this was the only kind of join deletion Codd objected to and McGoveran's update definition appears to prevent such a language.
The arguments against join deletion are strange because nearly all assume a replacement relation target such as A or B instead of one that has the set membership function and relational value expression and predicate of the join. It's why I say that in current relvar systems only base relvars actually have predicates - the dbms can replace only the relations that have predicate PA or PB, not any of the other five possible relations (at least one of which could be updated)!
Another argument that claims join deletions cause Codd's 'deletion anomaly' and therefore join deletions should be outlawed. This is off-base because Codd's solution to that anomaly had nothing to do with set operations or updates but rather with data design. When join deletion is understood it will be seen that a much bigger door opens that demands much more sophisticated data designs (and probably more automation to support them).
Anthony and Toledo,
ReplyDeleteI seem to have lost the last comments you submitted. If you have the text, pls re-submit. If not, pls contact me by email.
Thanks.
(I had to chop this into three parts to make it fit.)
ReplyDeleteFans of base relvars as well as any other language device that can be equated to physical structures or SQL implementers or file system coders are frequetly unable to understand data independence at all. Experience suggests that years of unlearning may be needed, the more coding experience the more years, which in turn might mean that only generational change will make any difference. McGoveran's 2004 statement that the RM is not well understood appears now to be more true than ever thanks to the accumulated obfuscations.
Codd would have had no difficulty accepting the specific deletion CJ Date describes above (see his book). While my own argument focuses only on join deletions rather than overall database consistency and is also less theoretical and more empirical than David McGoveran's, I've yet to see that it counters anything he's written and most of it was prompted by his explanations.
When I first used SQL-type products (in very basic ways even though they included every popular mainframe and Unix version) I had to wonder why a relational dbms didn't allow all user commands to address relations, instead these dbms'es addressed certain tables which were distinguished from other tables called views and none of the views could be explictly replaced by a user. Surely the material of the user interface must be relations, right? From what I'd seen of Codd's first and second papers, I couldn't understand why a dbms with the word 'relational' in its moniker didn't allow me to replace relations. After all, even a read-only database needs to replace empty declared relations before there is anything to query. As I saw more and more customer SQL databases I started to realize that all these products expected the user not only to program the database in addition to the application but to specify parts of it on the fly, the dbms didn't do that. So to replace some relations, for each one you had to find out what relations they were derived from and try to achieve the replacement you wanted via side-effects. There is now a whole sub-industry cluttered with experts to help users do that. Not only are applications more complicated than they need to be but there is no certain way to guarantee that every piece of application updating code is not making the intended data design inconsistent.
It's also important for fulfilling Codd's idea to see a schema and data as a logical program which the dbms then executes. At present the so-called db schema and user-supplied data do not amount to the logical program they could amount to. Instead multiple app developers in effect fill in the db program's missing code, repetitively and instead of executing the incomplete program much of what the dbms does is no more than regurgitating its input, the partial program as it were.
(end of part 1)
The authors of SQL never understood the RDM and SQL has considerable responsibility for much of the misunderstandings, not to mention the failure to implement it. Much of the rest is due to the substitution of product training for education and fundamentals.
Delete(part 2 of 3)
ReplyDeleteRegarding relvars, following McGoveran's Venn diagram suggestion, suppose a database has two declared non-empty relations and abstract it as two partially overlapping circles. The diagram has seven non-empty subsets each having homogeneous tuples, representing seven disjoint relations with logically distinct predicates, in other words five more predicates than the two of the declared relations. (See the ironically titled web page athttps://blog.jooq.org/2016/07/05/say-no-to-venn-diagrams-when-explaining-joins/
for a graphic.)
Now assume that the seven subsets are disjoint in the sense that each is used in a different deletion situation corresponding to one of the seven different combinations of red and white circle fragments/shapes. Assume any deletion operation is independent of any other so that the deletion context always includes only one of the combinations. In this argument I'm only talking the join of the middle layout and the two shapes that comprise the A circle which happen to be disjoint as well in the usual set sense.
For example, you could let the values of the declared relations be named A and B and name their respective predicates PA and PB. Name the value of their set intersection (A JOIN B) and name its predicate P(A JOIN B). Now name the A members that are not sub-tuples of (A JOIN B), which in relational algebra will require projection, as A MINUS ( ( A JOIN B ) { A attribute set }). This name can be thought of as the logical address of the A-tuples that are not present in the joined relation and equally so could its predicate P(A MINUS ( ( A JOIN B ) { A attribute set })) and it could also signify the set membership function of this homogeneous tupleset. A logical address has no particular unique physical counterpart, for example the logical address named (A JOIN B) might have several because the joined tuples are pairs of sub-tuples or it might have no physical 'row' counterpart in terms of a single tuple. Viewed as a logical construction with no particular dbms implementation in mind each of the seven subsets exists as a set in its own right. Each could have a physical counterpart or each might not, that's immaterial to the logical structure.
A given sub-tuple might appear many times in the ( A JOIN B) subset. If it appears, it's guaranteed to not appear in the subset addressed by A MINUS ( ( A JOIN B ) { A attribute set }). (Also, a sub-tuple that fails to appear in a tuple-pair replacement of A join B might continue to appear as an A-tuple.) When I first saw online objections to join deletions, one of them was to the effect that no single solution was possible because relational logic allowed three possibilities, one of which removed a pair of sub-tuples from ( A JOIN B ) and inserted one of those sub-tuples into A MINUS ( ( A JOIN B ) { A attribute set }. I asked how could deletion mean insertion too, how does set difference include set union but I have never received an answer.
( part 3 of 3)
ReplyDeleteThe problem with base relvars as a user device for join deletions is that the logical set of tuples addressed by the name A is not the set addressed by the name "( A JOIN B ) { A attribute set })" even though they have the same heading. These two sets are disjoint in the customary sense as well as my assumed sense and Codd's difference operator can calculate the value of only one of them (he called this relational closure). So if a relvar is used as a target device for replacing a relation either it must have a set membership function and relational expression and predicate that correspond to exactly one of the subsets or the dbms must be capable of addressing a subset of the relvar's tuples for which those three kinds of logical addresses agree. It might be that a certain implementation language allows users to express set differences such as "( A JOIN B) minus D" that don't preserve (A JOIN B) tuples that are not in D. As far as I know this was the only kind of join deletion Codd objected to and McGoveran's update definition appears to prevent such a language.
The arguments against join deletion are strange because nearly all assume a replacement relation target such as A or B instead of one that has the set membership function and relational value expression and predicate of the join. It's why I say that in current relvar systems only base relvars actually have predicates - the dbms can replace only the relations that have predicate PA or PB, not any of the other five possible relations (at least one of which could be updated)!
Another argument claims join deletions cause Codd's 'deletion anomaly' and therefore join deletions should be outlawed. This is off-base because Codd's solution to that anomaly had nothing to do with set operations or updates but rather with data design. When join deletion is understood it will be seen that a much bigger door opens that demands much more sophisticated data designs (and probably more automation to support them).
Throughout, I've changed "relvar predicate" (still used by CJD) to "relation predicate", as preferred in the DMG interpretation.
ReplyDeleteDate and McGoveran's first joint treatment of view updating [1994, published on dbdebunk at the time] was indeed confused in its usage of "predicate". I suspect this can be attributed to the authors failing to understand each other. Both authors have since disavowed that work; and quite understandably: it is dreadfully confused.
CJD has since clarified, and uses "relvar predicate" to mean the 'external predicate' of a relvar. DMG [and Pascal usually follows] uses 'relation predicate' to mean the 'internal predicate'/constraint on allowable content -- at least that's my best understanding.
So for [Ed] to gloss DMG's "tuples that cannot belong to the relation" as "Representing false instantiations of the predicate" is reintroducing the confusion so evident in the 1994 material. I think that by "tuples that cannot belong to the relation", DMG means tuples (taken as a set presented for update) for which the relation constraints do not hold (or will not hold when the update is applied); not tuples whose corresponding propositions do not hold in the world.
"Date and McGoveran's first joint treatment of view updating [1994, published on dbdebunk at the time] was indeed confused in its usage of "predicate". I suspect this can be attributed to the authors failing to understand each other. Both authors have since disavowed that work; and quite understandably: it is dreadfully confused."
DeleteI will say that knowledge and understanding of FOPL are critical, because that is what Codd's RDM and McGoveran interpretation are predicated on and leave it at that.
"CJD has since clarified, and uses "relvar predicate" to mean the 'external predicate' of a relvar. DMG [and Pascal usually follows] uses 'relation predicate' to mean the 'internal predicate'/constraint on allowable content -- at least that's my best understanding."
These are CJD's, not FOPL terms. In FOPL there is a formal predicate. We can talk about
(1) its informal expression in natural language, which we call a business rule;
(2) its expression in a specific data language, which we call integrity constraint.
You choose to ignore the effects of relvars on FOPL and, therefore, on the RDM.
"So for [Ed] to gloss DMG's "tuples that cannot belong to the relation" as "Representing false instantiations of the predicate" is reintroducing the confusion so evident in the 1994 material. I think that by "tuples that cannot belong to the relation", DMG means tuples (taken as a set presented for update) for which the relation constraints do not hold; not tuples whose corresponding propositions do not hold in the world."
Yes. A database relation is a set of tuples that represent a set of instantiations of the predicate that satisfy the RP. Conceptually, a RP is derivable from the conjunction of all the constraints that apply to the relation, but those constraints must be
declarative and
translated into FOPL and/or a data language of same expressive power and
transformed according to a particular wff schema.
David's book will have one or more chapters on “Constructing Relation Predicates”.
Under the CWA, tuples that represent instantiations that do not satisfy the RP cannot possibly be true and are absent from the relation. There are false instantiations that satisfy the RP, but they are kept out of the relation by the authorized users trusted with updates. So a relation represents directly (by presence) true instantiations ond indirectly (by absence) false instantiations.
David McGoveran replies (Part 1)
DeleteFor the record, the original articles were not published at dbdebunk, but in Database Programming & Design June, July, and August of 1994. Material that appeared in dbdebunk that year was slightly different, in that we had more freedom to add or clarify bits. While both Chris and I have pointed out various bits published in the 1994 period with which we are dissatisfied, and I have pointed out certain misunderstandings between us that were not discovered until later, overall Chris and I stand by the approach we introduced to view updating. There are details that matter and which are incorrect in the 1994 publications. More unfortunate is the fact that the articles addressed a very complex topic in advance of a more formal publication (and the care that would have required) and those hurried efforts--frankly driven by our excitement over the discovery of the approach--led to considerable misunderstandings by our readers. I, for one, apologize.
As to the 1994 work being “dreadfully confused” I think that an overstatement. While it perhaps left many “dreadfully confused”, or was “dreadfully confusing” to some, some of that indictment can be seen as arising from the fact that it introduced a radical departure from how relations and updating were understood in the earlier literature. There were a couple of technical errors and some of the pedagogical tools Chris used were ultimately misleading given his interpretation at the time, but beyond that the articles successfully introduced a new way of attacking the view update problem. One problem worth acknowledging in the present context is that Chris always liked to appeal to symmetry when there was no obvious argument to resolve an apparent ambiguity in the update procedure. Rather than publish this approach, we should have resolved the apparent ambiguity on logical grounds, as I have subsequently done (and Chris attempted to convey in the appendix of his 2013 book).
Fabian: "You choose to ignore the effects of relvars on FOPL and, therefore, on the RDM."
DeleteIf you read my comments carefully (on another article), I'm not ignoring the effects of relvars on FOPL, I'm asking why DMcG thinks having relvars impedes FOPL. [And I'll have more to say there.]
Actually I'm by no means enamoured of CJD's take on relvars (I suspect he's smuggling in something more than them standing for a relation) -- particularly I don't see virtuals being relvars in the same sense as base relvars.
So if there's a reason in logic for rejecting relvars, or virtuals-as-relvars, I'm all ears. So far DMcG has not persuaded me.
My reason for not seeing virtuals as relvars is -- precisely -- when it comes to persistence and updating. A programming variable's value is not supposed to change except through specific update. And yet a virtual's value changes 'by magic' when its based-on changes.
So I would use the terminology delete "through" virtual vs. delete "from" base -- which dichotomy CJD is trying very hard to avoid.
"One problem worth acknowledging in the present context is that Chris always liked to appeal to symmetry when there was no obvious argument to resolve an apparent ambiguity in the update procedure. Rather than publish this approach, we should have resolved the apparent ambiguity on logical grounds, ..." [And in the qute from CJD above, "symmetry" appears a couple of times.]
DeleteI wholeheartedly agree: if (when we take all constraints into account) a JOIN is one:many, then it's not symmetric. If there's an Inclusion Dependency, then it's not symmetric.
Furthermore always deleting tuples from both operands of a JOIN is not symmetric with inserting to the JOIN. Inserting to a JOIN does not necessarily insert into both operands, but only if the tuples are absent from both.
>"I'm asking why DMcG thinks having relvars impedes FOPL."
DeleteDoes this help?
Functional Programming: The Failure of State
https://www.youtube.com/watch?v=7Zlp9rKHGD4
David McGoveran replies To Clayden's of Dec. 26 at 4:01AM:
DeletePrecisely what CJD is "smuggling in" with the use of relvars (consciously or otherwise) is the transubstantiation of relational algebra from a data sublanguage into a computationally complete language. The former may be a declarative language with an evaluation algorithm (decision procedure) that works for every expression in the language (so that it is decidable), while no computationally complete language can be decidable if we are to believe Goedel's results. Without a declarative language (which is what gives us the necessary level of abstraction), there can be no physical data independence. Without algebraic closure and a declarative language, there can be no logical data independence that builds upon physical data independence. Instead we will have to create and test (read "debug") a query evaluation algorithm for each and every specific database to which the query language is applied.
David McGoveran replies To Clayden's of Dec. 26 at 6:49 AM:
DeleteI was not clear about CJD's appeal to symmetry because I thought is clear from his writings (including our 1994 papers). It had nothing to do with the kind of asymmetry mentioned here by Anthony, but to the idea that if, for example, an insert, update or delete to a relational expression involving two relations R1 and R2 can achieve the desired change by modifying R1, R2, or both R1 and R2, then the last modification should be the preferred one. Similar rules would apply if the desired change could be obtained by modifying R1, or else both R1 and R2.
This kind of rule seems to arise from thinking of an insert, update or delete in terms of some "desired" or "imagined" or "appropriate" result, rather than in terms of the mechanics of constructing the update expression precisely and applying the expression logically and mechanically. From this one assumes that operations can be ambiguous, when in fact, logic is never ambiguous. It is only our expression of the operation that is imprecise or incomplete, and that often arises because we don't really understand the relational expression (corresponding to some combination of relations) to which we are applying the operation.
>>"I'm asking why DMcG thinks having relvars impedes FOPL."
Delete> Does this help?
>
> Functional Programming: The Failure of State
Thank you Fabian, but that you think the topic of that video is atall relevant contains so many misconceptions, it's hard to know where to begin. I won't begin, because DMcG's objections are raised wrt the 'static semantics' of information represented in relations. He hasn't touched on the dynamics of managing state -- and I'm not expecting he will: it would need concepts of persistence/identity of databases and relations, which have no place in FOPL.
I am sorry you don't see the relevance. I did not mean, of course, that the entire video was relevant to our discussion here. Take these points together and maybe you'll see:
Delete1. Codd's "TIME-varying relations"
2. Date's introducing ASSIGNMENT as part of a solution to view updating;
3. The video talks about a change in state from t1 to t2.
If this is still not clear, it indicates insufficient familiarity with FOPL and you will have to wait for the book. For which reason this discussion is closed.
David McGoveran replies (Part 2):
ReplyDeleteChris’ use of internal and external predicate have long been a subject of my dissatisfaction with his terminology. They convey--to my mind--a horribly misleading and disastrous way of thinking: namely, that there is some “meaning” in the minds of users to be given formal recognition and that there is some formal “meaning” that the DBMS “understands” which is an approximation of the former. Nonsense. It is the responsibility of the data modeler to capture the intended subject domain in a conceptual model and then to formalize that in a logical model. What is in some users head regarding what is or should be meant by some relation is completely irrelevant. The data modeler is to document each relation he has created and the RP is the formal meaning of each relation in the logical model. It is up to users of that logical model as exemplified in the database to interact with the database consistently with that meaning. That implies they should have access to the RP for each relation they access (whether for retrieval, deletions, updates, or insertions) in a form they can comprehend and be trained to interact consistently. This is what we mean by “authorized users”: users that have the knowledge and skill not to use the tool incorrectly or blindly. While we can use constraints to make certain that the database is always in a consistent state, we can’t do anything to make sure an authorized user puts the database in a state that is inconsistent with some aspect of the physical world that it allegedly represents and it is a fallacy to think we should even try. That fallacy is one Chris has unfortunately entertained by introducing internal vs. external predicates and confusing “meaning” as formal semantics with “meaning” as subjective perception and intent.
Regarding tuples belonging to a relation, this is a complex, subtle subject for which I don’t have time, but a few quick comments that may help. In the relational model, ANY n-ary tuple can be understood as an instantiation of a relation predicate corresponding to an n-ary relation. In the untyped FOPL, that instantiation results in a sentence which can be evaluated as having either a true or a false truth value. When we talk about tuple we have to be clear as to whether we are talking about
tuples belonging to the current extension of some specific relation (what CJD would call the value);
tuples that cause the specific relation’s constraints to evaluate to a specific truth value (true or false); tuples that would cause that constraints to evaluate to true, but are not part of the current extension.
As I have said many times, the RP is necessarily the conjunction of all the relation’s constraints (those that can be expressed without reference to other relations), but this is not sufficient--it is a little incomplete. To differentiate the current extension of the relation from other sets of tuples that could be an extension of the relation, we have to know what instantiations have been asserted. I capture this idea in a special, primitive predicate I call the “assertion predicate”. We therefore have to be very careful when we talk about a tuple being true or false and which tuples do or do not “belong to the relation”. (That said as background, I agree with what Fabian says insofar as it goes.)
Regarding "tuples that couldn't be members of a relation", the dreadful confusion is mostly in the eyes of some beholders and David McGoveran is remarkably polite/tolerant. It would be a thankless job to try to list all the reasons for inability to understand such a simple idea, eg., coding training instead of theory training, design rule-of-thumb memorization by coders, an internet so replete with irrelevant details like data type representations that practically no observer can see the forest for the trees, physical database experts pretending to implement relational ideas, newcomers copying the physical coders, etc, etc.
DeleteFor argument's sake, a database without logical inferences, where all relations are base and each has a single predicate or membership function could be imagined and then one could ask whether for some tuple in one of them, is the recording of any of the other relations contradicted or is every statement/proposition true or does each 'true' set satisfy its predicate. (Of course for a real situation such a system's usually impractical as Codd seems to have known which must have been part of his reason for replacing physical data arrangements that were effectively such systems, but it seems like a good question for understanding the idea.)
"They convey--to my mind--a horribly misleading and disastrous way of thinking: namely, that there is some “meaning” in the minds of users to be given formal recognition and that there is some formal “meaning” that the DBMS “understands” which is an approximation of the former."
DeleteFrom "Introduction to Database Systems", 8th ed. :
"The first and most significant point is that, while internal predicates are a formal construct, external predicates are an informal construct merely."
"The system has to understand -indeed, can only understand- the internal ones"
The sentence a couple of lines down where the suggestion of "approximation" is made contains the very important word "loosely".
The system can "understand" things like "if the shop is not closed, then the alarm must not be set". Internal predicate (*). But it cannot understand "The shop is closed". External predicate.
(*) Sidestepping the issue here that the predicate is given in natural language and that of course it has some equivalent, and is actually stated, in the more formal and dbms-understood language of constraint declarations.
Date's own writings invariably confirm your qualification of the idea you describe : nonsense.
> "The system has to understand -indeed, can only understand- the internal ones"
DeleteDMcG was referring to the EFFECT of CJD's conceptualization, not to it itself.
Be that as it may, the core issue is the completeness of the formalization of the informal predicate--CJD's makes relations too abstract to be useful.
David McGoveran replies to Smout:
DeleteErwin's quoting of Introduction to Database Systems illustrates that my objections have not been understood as intended. My point is that the internal predicate vs external predicate terminology is flawed from the beginning. There is and can be no such thing as an external predicate, formal or informal. The concept is meaningless and completely useless, while suggesting that some "informal construct" should be considered. It should not. In that direction lies confusion. Furthermore, the "internal" adjective of "internal predicate" already suggests that the internal predicate is distinguishable from other kinds of predicate and defining it as a "formal predicate" suggests that predicates can be informal. They cannot. We can only have informal expressions of formal predicates. The system does not understand anything--it can only mechanically follow the rules of deductive inference or execute evaluation algorithms. There is no approximation to be made, loosely speaking or otherwise, because there is nothing that is not the predicate given by the database designer to approximate. This way of speaking (writing) confuses within every sentence the distinctions between subject language and object language, between conceptual model and logical data model. To be clear, I am not asserting that Chris fails to understand, only that the choice of explanation is confused and confusing. Lastly, we must understand that the relational algebra--which is the language most often used for declarative constraints - is not the same as FOPL, but a subset of it. To construct a relation predicate, we need the full expressive power of FOPL. Simply constructing the conjunction of declarative constraints expressed in the relational algebra in insufficient for expressing the set specification of a relation (i.e., the relation's intensional definition) which I call the relation predicate).
When Chris tries to explain relation predicates in terms of natural language expressions (however silted) and then to give examples of their application to the problem of updating, the explanation is fatally crippled and inherently confusing. While he can and might get the gist of the approach through such a pedagogical device, it does not suffice when analyzing the effectiveness of the approach or uncovering and addressing any difficulties in the procedure. It is rather like trying to explain algebra without using symbols.
Erwin's final sentence is rather terse and its intent completely unclear, especially the terminal "nonsense".
" I capture this idea in a special, primitive predicate I call the “assertion predicate”. "
DeleteI look forward to hearing more about this "assertion predicate". Whilst awaiting a full write-up, is it possible to give an example?
The merit of CJD's "relvar predicate" is it gives "authorized users" clear rules as to what they are asserting when they instruct the DBMS to insert tuple(s) or are denying when they instruct the DBMS to delete tuple(s).
I am of course assuming the data designer has done a disciplined job in both designing the schema and interpreting the business rules into those relvar predicates.
Which is why we have insisted on a wait for the book where the the exposition of the RDM will be formal. A bit harder to wave arms with a formalism.
DeleteThat assumption is valid in a world where designers understand the RDM, database design and the business domain they model and not the world we inhabit.